Figures
Lecture 9
Review: t-test()
- One-sample t-test: t.test(x = sample, alternative = "two.sided"
or “greater” or “less”)
- Two-sample t-test: t.test(x = sample1, y = sample2, alternative
= "two.sided" or “greater” or “less”, var.equal = TRUE or
FALSE)
- Paired t-test: t.test(x = sample1, y = sample2, alternative =
"two.sided" or “greater” or “less”, paired = TRUE, var.equal =
TRUE or FALSE)
Figures
- From previous lectures, we have found numeric results
- Today, we’ll learn to show results visually
- Scatterplot
- Boxplot
- Histogram
- Ggplot2
Scatterplot
plot(x = __, y = ___, main = “__”, xlab = “__”, ylab = “__”)
- Scatterplot of Y v.s. X
Example:
- From Beer's law, we expect a graph of optden vs carb to
form a line
- plot(x = Formaldehyde$carb, y = Formaldehyde$optden)
Review of
lecture 7
plot() R Documentation
type
what type of plot should be drawn. Possible
types are
"p" for points,
"l" for lines,
"b" for both,
"c" for the lines part alone of "b",
"o" for both overplotted’,
"h" for ‘histogram’ like (or ‘high-
density’) vertical lines,
"s" for stair steps,
"S" for other steps, see ‘Details’ below,
"n" for no plotting.
All other types give a warning or an error;
using, e.g., type = "punkte" being equivalent
to type = "p" for S compatibility. Note that
some methods, e.g.plot.factor, do not accept
this.
main
an overall title for the plot: see title.
sub
a sub title for the plot: see title.
xlab
a title for the x axis: see title.
ylab
a title for the y axis: see title.
asp aspect ratio, see plot.window.
Boxplots
boxplot(x = __, main = “__”, xlab = “__”, ylab = “__”)
- Boxplot X
Example:
- Sleep dataset
- boxplot(x = sleep$extra, main = "Boxplot of sleep changes",
ylab = "hours")
We can subset control and experimental group, creating 2 boxplots
boxplot() R documentation
formula
a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the
grouping variable grp (usually a factor). Note that ~ g1 + g2 is equivalent to g1:g2.
data
a data.frame (or list) from which the variables in formula should be taken.
subset
an optional vector specifying a subset of observations to be used for plotting.
na.action
a function which indicates what should happen when the data contain NAs. The default is to ignore missing
values in either the response or the group.
xlab, ylab
x- and y-axis annotation, since R 3.6.0 with a non-empty default. Can be suppressed by ann=FALSE.
Histograms
hist(x = __, main = “__”, xlab = “__”, ylab = “__”)
- Histogram of X
Example:
- Sleep dataset
- hist(x = sleep$extra, main = "Histogram of sleep changes",
xlab = "hours")
We can subset control and experimental group, creating 2 boxplots
hist() R documentation
x
a vector of values for which the histogram is desired.
breaks
one of:
a vector giving the breakpoints between histogram cells,
a function to compute the vector of breakpoints,
a single number giving the number of cells for the histogram,
a character string naming an algorithm to compute the number of cells (see ‘Details’),
a function to compute the number of cells.
In the last three cases the number is a suggestion only; as the breakpoints will be set to pretty
values, the number is limited to 1e6 (with a warning if it was larger). If breaks is a function, the x
vector is supplied to it as the only argument (and the number of breaks is only limited by the amount
of available memory).
...
Your Turn: create a figure
- Create a plot of your choosing using data of your
choosing
- Potential data: sleep, Formaldehyde,
ChickWeight, heart
Hint:
plot(x = __, y = ___, main = “__”, xlab = “__”, ylab = “__”)
boxplot(x = __, main = “__”, xlab = “__”, ylab = “__”)
hist(x = __, main = “__”, xlab = “__”, ylab = “__”)
ggplot2
- So far, the figures we have
made look very crude
- For scientific publications,
want figures that look nicer
- ggplot2: package that
helps with data
visualization
Image source: http://www.cookbook-
r.com/Graphs/Multiple_graphs_on_one_page
_(ggplot2)/
ggplot2
- How to start:
install.packages("ggplot2") #installs this package to your R
library(ggplot2) #opens the package for use
- Have these two line exactly as shown at the top of your R
script
- Only need to run the first line once, whereas run the second
line whenever you open the script again
Ggplot2: ggplot()
- ggplot(dataset, aes(x = __, y = __)) creates a blank plot
- Aesthetics (aes) are features on your plot
- Use aes() argument to specify what your x and/or y
variables are
- We then add stuff to the plot by adding layers
- + geom_point() adds points (scatterplot)
Ggplot2: Formaldehyde scatterplot
- ggplot(dataset, aes(x = __, y = __)) + geom_point()
Example: optden (absorbance) vs carb (concentration) in
Formaldehyde dataset
ggplot(Formaldehyde, aes(x = carb, y = optden))
ggplot(Formaldehyde, aes(x = carb, y = optden)) + geom_point()
Ggplot2: scatterplot points
- ggplot(dataset, aes(x = __, y = __)) +
geom_point(col = “__”, size = __)
- Add arguments to geom_point() to specify features of
points
- col or color = the color of the points
- size = the size of the points
- If the last symbol on the previous line is a +, you can have
the next function on a new line
Example: Formaldehyde Absorbance vs Concentration
ggplot(Formaldehyde, aes(x = carb, y = optden)) +
geom_point(col = "purple", size = 3)
Ggplot2 colors
Source: https://www.r-graph-gallery.com/ggplot2-color.html
Ggplot2: scatter plot line
- ggplot(dataset, aes(x = __, y = __)) +
geom_point(col = “__”, size = __) +
geom_smooth(method="lm", col = "__")
- Add linear model (with 95% CI) to plot
Example:
ggplot(Formaldehyde, aes(x = carb, y = optden)) +
geom_point(col = "purple", size = 3) +
geom_smooth(method="lm", col = "red")
Ggplot2: variables
- plot(variable_containing_ggplot)
Example:
#Code getting kind of long, let's store our plot in a variable
g <- ggplot(Formaldehyde, aes(x = carb, y = optden)) +
geom_point(col = "purple", size = 3) +
geom_smooth(method="lm", col = "red")
plot(g)
Ggplot2: labels
+ labs(title="TITLE", subtitle="SUBTITLE", y="Y AXIS LABEL",
x="X AXIS LABEL", caption="CAPTION")
Example:
g + labs(title="Beer's Law", subtitle="This is a ggplot2 subtitle",
y="Absorbance", x="Concentration", caption="Caption here
under plot")
You won’t be tested on
this, but for fun I wanted to
show BMI vs cholesterol
stratified by smoking status
(from lecture 7)
Your Turn: create a ggplot2 scatterplot
- Feel free to use any dataset and any variables within
that dataset
Hint:
install.packages("ggplot2")
library(ggplot2)
ggplot(Formaldehyde, aes(x = carb, y = optden)) +
geom_point(col = "purple", size = 3) +
geom_smooth(method="lm", col = "red") +
labs(title="Beer's Law", subtitle="This is a ggplot2 subtitle", y="Absorbance",
x="Concentration", caption="Caption here under plot")
Ggplot2: Boxplot
- ggplot(dataset, aes(x = __, y = __)) creates a blank plot
- Aesthetics (aes) are features on your plot
- Use aes() argument to specify what your x and y
variables are
- We then add stuff to the plot by adding layers
- + geom_point() adds points (scatterplot)
- + geom_boxplot() adds boxplots
Ggplot2: Boxplot with both x and y axes
Use aes(col = ___) to specify what variable to color based on
Ggplot2: Boxplot more layers
- Layers from when we created a scatterplot can also be used for
boxplots
Example:
ggplot(sleep, aes(x = group, y = extra, color=group)) +
geom_boxplot() +
labs(title = "Sleep Changes in the Control and Experimental Group") +
geom_point(col = "black")
Ggplot2: Boxplot Color Palette
Ggplot2 palettes
Source: http://r-statistics.co/Complete-Ggplot2-Tutorial-Part1-With-R-Code.html
Your Turn: create a ggplot2 boxplot
- Feel free to use any dataset and any variables within
that dataset
Hint:
install.packages("ggplot2")
library(ggplot2)
#Remove color = __
ggplot(sleep, aes(x = group, y = extra, color=group)) +
geom_boxplot() +
labs(title = "Sleep Changes in the Control and Experimental Group") +
cale_colour_brewer(palette = "Pastel1")
Ggplot2: Histogram
- ggplot(dataset, aes(x = __, y = __)) creates a blank plot
- Aesthetics (aes) are features on your plot
- Use aes() argument to specify what your x and y
variables are
- We then add stuff to the plot by adding layers
- + geom_point() adds points (scatterplot)
- + geom_boxplot() adds boxplots
- + geom_histogram() creates histogram
Your Turn: create a ggplot2 histogram
- Feel free to use any dataset and any variables within
that dataset
Hint:
install.packages("ggplot2")
library(ggplot2)
ggplot(sleep, aes(x=extra)) + geom_histogram()
ggplot(sleep, aes(x=extra, color=group, fill = group)) +
geom_histogram(bins = 8)
Just for fun: you can add
all sorts of layers, but it’s
up to you (the human) to
make sure what you’re
doing is correct for your
research.
I put points on top of a
boxplot on top of a
histogram. R doesn’t care
and will do it.
Check out the posted cheatsheet for
more ggplot functions
Ggplot2: Use R Documentation!
https://drsimonj.svbtle.com/pretty-scatter-plots-with-ggplot2
https://economicsfromthetopdown.com/2019/07/30/making-beautiful-charts-using-r-ggplot/
https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/
https://www.r-spatial.org/r/2018/10/25/ggplot2-sf-2.html
https://mode.com/blog/r-ggplot-extension-packages/
https://www.google.com/url?sa=i&url=https%3A%2F%2Fvodkhang.com%2Fdatavisual%2Flinkedin-network-analysis-with-ggplot2&psig=AOvVaw2tHQcywTMpKWeifDhMG-
B0&ust=1604103628695000&source=images&cd=vfe&ved=0CA0QjhxqFwoTCNDEyvGF2-wCFQAAAAAdAAAAABAt
Unlimited possibilities!